[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor #29162

varun-sundar-rabindranath · 2025-11-21T07:47:31Z

Purpose

On B200, the following command fails,

VLLM_ALL2ALL_BACKEND=deepep_low_latency VLLM_USE_DEEP_GEMM=1  vllm serve  Qwen/Qwen3-30B-A3B-FP8 --trust-remote-code --tensor-parallel-size 1 --data-parallel-size 4 --enable-expert-parallel --no-enable-prefix-caching  --port 9010 --enable-eplb  --eplb-config '{"window_size":10,"step_interval":100,"num_redundant_experts":0,"log_balancedness":true}' --enforce-eager

On B200, when we use DeepGEMM, we transpose the MoE weight_scale tensors for efficient DeepGEMM matmuls. This in combination with EPLB fails the following assertion,

        assert all(
            weight.is_contiguous()
            for name, weight in weights
            if not name.startswith("_shared_experts.")
        )

as the weight scale tensors are not contiguous.

Fix: This PR changes to view of the tensor so the is_contiguous() check passes. We also add a test to verify that this view update is safe.

Test Plan

tests/distributed/test_eplb_fused_moe_layer.py passes

lm-eval test

server-command : VLLM_ALL2ALL_BACKEND=deepep_low_latency VLLM_USE_DEEP_GEMM=1  vllm serve  Qwen/Qwen3-30B-A3B-FP8 --trust-remote-code --tensor-parallel-size 1 --data-parallel-size 4 --enable-expert-parallel --no-enable-prefix-caching  --port 9010

lm_eval --model local-completions --tasks gsm8k --model_args model=Qwen/Qwen3-30B-A3B-FP8,base_url=http://localhost:9010/v1/completions,num_concurrent=30,max_retries=3 --limit 100

Test Result

server command + --enable-eplb --eplb-config '{"window_size":10,"step_interval":100,"num_redundant_experts":0,"log_balancedness":true}'

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.88|±  |0.0327|
|     |       |strict-match    |     5|exact_match|↑  | 0.92|±  |0.0273|

server command (without eplb)

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.87|±  |0.0338|
|     |       |strict-match    |     5|exact_match|↑  | 0.89|±  |0.0314|

mergify · 2025-11-21T07:49:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @varun-sundar-rabindranath.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/fused_moe/layer.py

gemini-code-assist

Code Review

This pull request addresses a bug that occurs on B200 hardware when using DeepGEMM in combination with EPLB, where transposed, non-contiguous MoE weight scale tensors cause an assertion failure. The fix involves changing the tensor's view to be contiguous before it's processed by the EPLB logic. The PR also introduces a new test to validate the fix and refactors some distributed testing utilities into a shared file, which is a good improvement. The overall approach is sound and the changes are well-implemented. I have one minor suggestion to improve the clarity of a docstring for future maintainability.

vllm/model_executor/layers/fused_moe/layer.py

varun-sundar-rabindranath · 2025-11-21T08:21:50Z

@codex review

varun-sundar-rabindranath · 2025-11-21T08:22:45Z

cc @elvircrn @ilmarkov @SageMoore @abmfy PTAL! Thanks 🙌

chatgpt-codex-connector · 2025-11-21T08:30:15Z

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

ilmarkov · 2025-11-21T09:00:30Z

Looks good to me! Thank you for the fix!

ilmarkov · 2025-11-21T09:08:14Z

vllm/model_executor/layers/fused_moe/layer.py

+
        weights = list(self.named_parameters())
+        weights = [(name, _maybe_make_contiguous(name, p)) for name, p in weights]
+


Probably, instead of is_contiguous check we need to check that we are row-major with num_local_experts in the first dimension. The fact that tensor is contiguous in other dimensions is not important as we flatten the view in those dimensions.

vllm/model_executor/layers/fused_moe/layer.py

varun-sundar-rabindranath · 2025-11-21T17:41:09Z

Looks like pre-commit is failing on unrelated files in vllm PRs - Saw the same failure here #29188

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

abmfy

The current fix LGTM. Thanks!

abmfy · 2025-11-21T21:47:32Z

The point of the assertion is to ensure that .view(num_experts, -1) does not fail. I have not actually tested this, but as @ilmarkov suggested, we could simply check whether the tensor is row-major and then remove that view operation entirely.

varun-sundar-rabindranath requested review from mgoin and pavanimajety as code owners November 21, 2025 07:47

mergify bot added the needs-rebase label Nov 21, 2025

chatgpt-codex-connector bot reviewed Nov 21, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 21, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

varun-sundar-rabindranath force-pushed the varun/fix-eplb branch from 64d9ca1 to 25db0d9 Compare November 21, 2025 08:07

mergify bot removed the needs-rebase label Nov 21, 2025

ilmarkov reviewed Nov 21, 2025

View reviewed changes

tlrmchlsmth approved these changes Nov 21, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 21, 2025

varun-sundar-rabindranath mentioned this pull request Nov 21, 2025

[Build/CI][DP/EP] Add QWen/Qwen3-30B-A3B-FP8 + EPLB tests to Nightly H100 and B200 #29195

Open

Fix B200 + DeepGEMM + EPLB

9a1d798

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath force-pushed the varun/fix-eplb branch from ba3820b to 9a1d798 Compare November 21, 2025 19:31

tlrmchlsmth enabled auto-merge (squash) November 21, 2025 19:47

abmfy approved these changes Nov 21, 2025

View reviewed changes

mgoin approved these changes Nov 21, 2025

View reviewed changes

vllm-bot merged commit 3137991 into vllm-project:main Nov 21, 2025
52 of 55 checks passed


		weights = list(self.named_parameters())
		weights = [(name, _maybe_make_contiguous(name, p)) for name, p in weights]

Uh oh!

[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor #29162

[BugFix] EPLB + B200 + DeepGEMM : Handle column-major scales tensor #29162

Conversation

varun-sundar-rabindranath commented Nov 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

lm-eval test

Test Result

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Nov 21, 2025

Uh oh!

varun-sundar-rabindranath commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 21, 2025

Uh oh!

ilmarkov commented Nov 21, 2025

Uh oh!

ilmarkov Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Nov 21, 2025

Uh oh!

abmfy left a comment

Choose a reason for hiding this comment

Uh oh!

abmfy commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

varun-sundar-rabindranath commented Nov 21, 2025 •

edited by github-actions bot

Loading

abmfy commented Nov 21, 2025 •

edited

Loading